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In this paper we explore the concept of hierarchy as a quantifiable descriptor of ordered struc- 
tures, departing from the definition of three conditions to be satisfied for a hierarchical structure: 
order, predictability and pyramidal structure. According to these principles we define a hierarchical 
index taking concepts from graph and information theory. This estimator allows to quantify the 
hierarchical character of any system susceptible to be abstracted in a feedforward causal graph, i.e., 
a directed acyclic graph defined in a single connected structure. Our hierarchical index is a balance 
between this predictability and pyramidal condition by the definition of two entropies: one attending 
the onward flow and other for the backward reversion. We show how this index allows to identify 
hierarchical, anti-hierarchical and non hierarchical structures. Our formalism reveals that departing 
from the defined conditions for a hierarchical structure, feedforward trees and the inverted tree 
graphs emerge as the only causal structures of maximal hierarchical and anti-hierarchical systems, 
respectively. Conversely, null values of the hierarchical index are attributed to a number of different 
configuration networks; from linear chains, due to their lack of pyramid structure, to full-connected 
feedforward graphs where the diversity of onward pathways is canceled by the uncertainty (lack of 
predictability) when going backwards. Some illustrative examples are provided for the distinction 
among these three types of hierarchical causal graphs. 
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The idea of hierarchy has been largely at- 
tributed to a disparate number of systems and, al- 
though easily perceived, its quantification is not a 
trivial issue. In this work we quantify the hierar- 
chy of a given causal structure with a feedforward 
structure. Starting with the representation of a 
system of causal relations as a graph, we define a 
non heuristic measure of hierarchy having strong 
grounds on the principles of information theory. 
We depart from the definition of the conditions 
for a system to be considered perfectly hierar- 
chical: a pyramidal structure with a completely 
predictable reversion of the causal flow. In this 
context, a hierarchy index is defined by weight- 
ing how far is a given feedforward structure from 
these conditions. As we shall see, structures that 
fully satisfy this property belong to a special class 
of trees. Our estimator allows to establish a quan- 
titative criterion for the definition of hierarchic, 
non-hierarchic and anti-hierarchic networks. 



I. INTRODUCTION 

The existence of some sort of hierarchical order is an 
apparently widespread feature of many complex systems, 
including gene [33] and human brain [2j [19] networks, 
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ecosystems [12] [20] , social and urban structures [15] , the 
Internet [31] or open-source communities [30]. The pres- 
ence of such underlying order in the multiscale organiza- 
tion of complex systems is a long standing hypothesis [32] 
giving rise to the idea of hierarchy as a central concept - 
see also [13] . Although usually treated only in qualitative 
terms, some formal approaches to the problem have been 
proposed. The efforts towards a well-defined quantifica- 
tion of hierarchical order have been improving by means 
of complex networks theory. As a key part of their or- 
ganization, dedicated efforts have been made towards a 
proper identification of hierarchical trends. One outcome 
of these efforts has been a number of powerful, heuristic 
measures |1 [17] [23] [25] [29] [31] . 

Often, a nested organization -formally identical to or- 
der, in set-theoretical terms [16]- can be identified as the 
basis of hierarchical order. If we think in hierarchy in 
these terms, we might agree with Herbert Simon that "it 
is a commonplace that nature loves hierarchies" -cited in 
[21]. Many examples belong to this picture. Within the 
context of matter organization, molecules are made of 
atoms, which result from the combination of elementary 
particles, some of which having also internal subunits (as 
quarks). Similarly, the relation of characteristic scales of 
organization by inclusion of one in another, like Chinese 
boxes or Matryoshka dolls, has been seen as a sort of hi- 
erarchical organization. Another relevant example is the 
case of fractal structures which naturally define a hier- 
archy of self-similar objects. Finally, within the context 
of taxonomy, spin glasses or optimization theory, the use 
of ultrametricity has also allowed to define hierarchical 
order [22]. 

Biological hierarchies have also evolved through time 
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FIG. 1: Order and hierarchy, a) Total or complete order either defined by inclusion displaying nested ensembles, v± C V2 C 
^3 C va or in terms of the order relation v± > V2 > V3 > va (left). The direct or immediate relation depicted by its causal graph 
(right), b) The ideal hierarchy structure assumed in this work represented by its nested organization (left) and its causal graph 
(right), c) An example of a partial ordered defined by inclusion (left) and its respective causal graph (right). 



as part of a process that generates high-level entities out 
of nesting lower-level ones [HI [18]. However, taxonomic 
classification trees are probably the most obvious repre- 
sentation of a hierarchy based on inclusion. In biology, 
living beings are individuals grouped in taxa according 
their characteristics. Starting from species, a nested hi- 
erarchy is defined where species belong to genera, which 
are included within families forming orders and so forth. 
Here, every organism is, in principle, unambiguously clas- 
sified and therefore no uncertainty can be associated to 
the process of classification. 

Alternatively to the view of nestedness, a hierarchical 
organization can be defined in a structure of causal re- 
lations. Paradigmatic examples are the flowchart of a 
company or the chain of command in the army, where 
the authority concept defines a particular case of causal 
relation. Causality induces an asymmetrical link between 
two elements, and this asymmetry, either by inclusion or 
through any kind of causal relation, defines an order. We 
observe that any circular or symmetrical relation between 
two elements violates the concept of order, thereby intu- 
itively loosing its hierarchical nature. From this perspec- 
tive, any feedforward relation is potentially hierarchical. 
Such feedforward structures pervade a diverse range of 
phenomena and structures, from phylogenetic trees to 
river basins. 

A common feature of most of the approaches men- 
tioned above is that the notion of hierarchy is basically 
identified with the concept of order. Order is a well de- 
fined concept in mathematics [16, 28 but, is order enough 
to grasp the intuitive idea of hierarchy? Can we actually 
define what is hierarchy? Quoting Herbert Simon, [27] . 

(...) a hierarchical system -or hierarchy- can 
be defined in principle as a system that is 
composed of interrelated subsystems, each of 
the latter being also hierarchic in structure 
until the lowest scale is reached[S^]. 

This definition does not provide a clear formalization of 
hierarchy as a measurable feature, although it certainly 



grasps the intuitive idea of hierarchy. How can such a 
general measure be defined? It is reasonable to assume 
that we have a hierarchy if there is no ambiguity (or 
uncertainty) in the chain of command followed for any 
individual to the chief in the flowchart. We shall call 
this feature the definiteness condition. This is also valid 
for nested structures. One might think that this con- 
dition is justified by a single chain of command or in 
the case of matryoshka dolls. However, such structures 
are already defined within order theory as totally ordered 
(see fig. [IJi). Order and hierarchy are closely related 
but they are not essentially the same. In this paper we 
reserve the word hierarchy to designate a concept that 
goes beyond the definition of order. We argue that the 
difference stems from the fact that a hierarchical struc- 
ture must also satisfy a pyramidal organization constraint 
-see fig. ^p). In other words, the lower the layer of or- 
ganization, the larger the number of entities it contains. 
But, what happens when this pyramid structure is in- 
verted? Intuitively, they would not be hierarchical but 
anti-hierarchical [35]. In this work we show how informa- 
tion theory naturally provides the suitable framework to 
characterize hierarchy in causal structures. Within this 
theoretical apparatus we provide the rigorous definition 
of the hierarchy index for causal structures and how it is 
applied in some illustrative examples establishing a dis- 
tinction among hierarchical, non-hierarchical and anti- 
hierarchical. 



II. DIRECTED GRAPHS, ORDERED GRAPHS 
AND CAUSAL GRAPHS 



In this section we will present the basic theoretical 
background used in this paper, grounded both in order 
and graph theories. At the end of this section we will 
formally define the causal graph, the key concept of this 
theoretical framework where the proposed hierarchy mea- 
sures are applied. 
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A. Basic concepts of order 

Hierarchy is undoubtedly tied to order. This is why 
we make a brief review of order theory highlighting some 
features that have been commonly attributed to hierar- 
chy. The first task is to define an ordered pair between 
two elements a k ,aj of a given set A, to be written as 
a k > CLj, {ak,CLj) or, in a formally equivalent way: 

(a k , dj) = {{a k }, {a k , dj}}. 

This latter formalization explicitly defines order from an 
inclusion relation [16]. This immediately connects to 
standard views of hierarchical systems, as we already 
mentioned, in which inclusion relations are considered 
essential. Having defined an ordered pair, we define an 
order relation. Let A = {ai, a n , ...} be a countable, 
finite set and TZ G A x A a relation. Such a relation is an 
order relation -rigorously speaking, a strict partial order- 
if the following condition holds: 

i) (a k ,a k ) £ TZ, 

ii) ((di,a k ) eK) ((a k ,ai) £ TZ), 

in) ((ai, a k ) G TZ A (a k , aj) G TZ) => ((ai, aj) G TV). 

We finally define two subsets of A from the definition 
of order relation which will be useful to characterize the 
kind of structures studied in this paper. The set of max- 
imal elements of A, to be written as M C A, is defined 
as: 

M = {a k e A:$aj e A: (aj,a k ) G TV}. 

Similarly, the set of minimal elements, to be written as 
li C A is defined as: 

fi = {a k G A : $aj G A : (a k , aj) G TV). 

B. Basic concepts of Directed Acyclic Graphs 

Let Q(V,E) be a directed graph, being V = 
{vi,...,v n }, \V\ = n, the set of nodes, and E = 
{(v k , Vi), (vj, vi)} the set of arcs -where the order, 
(vk,Vi) implies that there is an arc in the following di- 
rection: v k — » Vi. Given a node G V, the number 
of outgoing links, to be written as k out (vi), is called the 
out- degree of V{ and the number of ingoing links of vi is 
called the in- degree of Vi, written as ki n (vi). The ad- 
jacency matrix of a given graph Q, A(Q) is defined as 
Aij(Q) = 1 O ( v i-> v j) £ E; and Aij(Q) = otherwise. 
Through the adjacency matrix, k in and k out are com- 
puted as 

k in (vi) = ^Aji(g)] k out (vi) = ^A i:j (g). (1) 

j<n j<n 

Furthermore, we will use the known relation between the 
k-th power of the adjacency matrix and the number of 



paths of length k going from a given node V{ to a given 
node Vj Specifically, 

k times 

{A\g)) t3 = (A(<7) x * x A(0)) y (2) 

is the number of paths of length k going from node V{ to 
node Vj [IT]. 

It is said that Vi dominates v k if (vi,v k ) G E. A feed- 
forward or directed acyclic graph (DAG) is a directed 
graph characterized by the absence of cycles: If there 
is a directed path from Vi to v k (i.e., there is a finite 
sequence Vj), (vj, vi), (vi, v 3 ), (v m , v k ) G E) then, 
there is no directed path from v k to V{. Conversely, the 
matrix A T (Q) depicts a DAG with the same underlying 
structure but having all the arrows (and thus, the causal 
flow) inverted. The underlying graph of a given DAG 
to be written as Q u: is the undirected graph Q U (V, E u ) ob- 
tained by substituting all arcs of E, (vi, v k ), (vj, v s } : .... 

by edges giving the set E u = {vi,v k },{vj,v a }, A 

DAG Q is said to be connected if for any pair of nodes 
combination v^v\ G V there is a finite sequence of pairs 
having the following structure 

{Vi, V k }, {V k , Vj}, {v m , v s }, {v s ,vi}, 

being {v h v k }, {v k , Vj}, {v m , v s }, {v s , v{\ G E u . 

Given the acyclic nature of a DAG, one can find a finite 
value L(Q) as follows: 

L{Q) = max{fc : (3v u Vj € V : (A^Q))^ 0)}. (3) 

It is easy to see that L(Q) is the length of the longest 
path of the graph. 

Borrowing concepts from order theory [28], we define 
the following set: 

M = {vieV:k in (vi) = 0}, (4) 

to be named the set of maximal nodes of Q, by which 
\M\ = m. Additionally, one can define the set of nodes 
ji as 

V = {vieV :k out (v i ) = 0} (5) 

to be referred as the set of minimal nodes of Q. 

The set of all paths 7Ti,...,7r s , s > \E\, from M to a 
given node vi G fi is indicated as Um^G)- Given a node 
Vi G /i, the set of all paths from M to vi is written as 

iimm(^) ^ n M/ z(£)- 

Furthermore, we will define the set v(ir k ) as the set of all 
nodes participating in this path, except the maximal one. 
Conversely, the set v(7r k ) is the set of all nodes partici- 
pating on this path, except the minimal one. Attending 
to the node relations depicted by the arrows, and due to 
the acyclic property, at least one node ordering can be 
defined, establishing a natural link between order theory 
and DAGs. This order is achieved by labeling all the 
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nodes with sequential natural numbers and obtaining a 
configuration such that: 

(y(vi,Vj) € E)(i < j). (6) 

The existence of this labeling connects order relations 
with directed acyclic graphs. 

Finally, throughout this paper we reserve the word tree 
to refer to those graphs where all nodes excluding the 
maximal one have k{ n = 1 and all nodes except the min- 
imal ones display k out > 1. Therefore, we distinguish 
between chains (all nodes with k out = 1 excluding the 
minimal one) and trees. 

C. Causal Graphs 

In a causal graph we only consider immediate relations 
between elements i.e. two elements are causally related 
if there exists just one cause-effect event relating them. 
We explicitly neglect those relations between nodes which 
can only be derived by transitivity. A causal relation can 
be illustrated by genetic inheritance in a genealogy. Off- 
spring's characters come from its parents and indirectly 
from its grandparents. Therefore, no direct causal rela- 
tion can be defined between grandparents and grandsons. 
However, it is true that grandparents indirectly deter- 
mine the characters of grandsons, due to the transitive 
nature of the genetic relations. 

In this work we will restrict the use of the term causal 
relation to refer to direct relations such as direct parent- 
sons relations, as described in the above example. A 
causal graph G(V, E) is a directed graph where V are the 
elements of a set (the members of a family, in the above 
described example) and E are the causal relations that 
can be defined between the members of V . In this work, 
we restrict the term causal graphs to graphs being acyclic 
(i.e., DAGs) and connected. The former property avoids 
conflicts in the definition of the causal flow. The latter 
property assumes that two non-connected causal struc- 
tures have no relation among them, and therefore must 
be considered as two independent systems. Hereafter, we 
will refer to the set of paths Hmh(G) as the set of causal 
paths. 

III. THE CONCEPTUAL BACKGROUND OF 
HIERARCHY 

In this section we propose the basis for a rigorous eval- 
uation of hierarchy. We begin by defining the features of 
what we consider as the perfect hierarchical structure. As 
will be shown below, our proposed definition of hierarchy 
matches with an ordered structure with special features, 
thereby making an explicit difference between order and 
hierarchy. Therefore, we reserve the term hierarchy to 
refer to a special class of order. Within the framework 
of graph theory, the required conditions naturally match 
those displayed by a tree-like feedforward graph. Then, 



as we shall see, the estimator we propose identifies the 
feed-forward tree topology as perfectly hierarchical. The 
main point of the section is devoted to the definition of a 
quantitative estimator of hierarchy based on two entropy 
measures that captures the intuitive ideas described in 
the introductory section: the definiteness and pyramidal 
organization condition. We stress that the forthcoming 
formalism applies only to the class of causal graphs. 



A. The starting point: Defining the perfect 
Hierarchy 

We are going to refer to a system as perfectly hierarchi- 
cal if it satisfies the following conditions. Let us consider 
a system depicted by a causal graph G(V, E). We say that 
this graph G will be perfectly hierarchical if the following 
two conditions hold: 

1. Definiteness condition. - For every element v k G V\ 
M there is only one element vi G V, Vi ^ Vk such 
that (vi,Vk) G E. A straightforward consequence 
of this condition is that 

m = 1. 

2. Pyramidal condition.- There is a partition W = 
{cji, ...,cj m } of the set V, i.e., 

V = {J UJ ii VUi,UJ k G W^if^LJk = 
W 

by which: 

(V(vi,ve) G E)(vi G Uj) => (k out (vi) > 1) A (v £ G Uj+i). 
A direct consequence of the above is that 

l^i I < 1^2 1 < ... < h^ii, 

which reflects the pyramidal structure of the graph. 

A measure of hierarchy must properly weight the devia- 
tions of the studied graph from the above requirements. 
One could add another condition, by imposing that ev- 
ery node in a given layer dominates the same number of 
nodes contained in the next downstream layer. In for- 
mal terms, this implies that, in addition to the above 
conditions, we can add a third one, namely: 

3. Symmetry condition. It is established by means of 

(Vvi,vi G Uj)(kout(vi) = k) => (k out (ve) = k). 

Which actually corresponds to a so-called complete &-ary 
tree (Gross and Yellen, 1999). Therefore, in those cases 
where symmetry is considered as an inherent feature of 
ideal hierarchy, deviations from symmetrical configura- 
tions must also be taken into account in our quantitative 
approximation. 
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FIG. 2: Causal graphs intuitively capturing different degrees 
of hierarchy, a) A symmetrical tree-like causal graph showing 
the ideal hierarchical structure assumed in this work b) An 
asymmetrical tree-like hierarchical graph, c) A causal graph, 
with a pseudo-pyramidal condition violated by the presence 
of more than one maximal node, short-cuts between layers. 
Intuitively non hierarchical and anti-hierarchical structures 
illustrated by d) an anti-hierarchical star graph with | V — 1 1 
maximals exhibiting and inverted pyramidal condition (note 
that inverting the arrows we would have a tree-like graph) e) 
an ordered but no hierarchical linear chain violating the pyra- 
midal condition and f) an inverted tree- like structure where no 
definiteness and pyramicity conditions are satisfied onwards 
but completely satisfied backwards. 



Let us summarize the above statements 1), 2) and 3) 
and their consequences. The so-called definiteness condi- 
tion implies that there is no uncertainty in identifying the 
premise of a given causal relation, i.e., the node that im- 
mediately governs a given node. Taking into account the 
definition of causal graph, (essentially, a connected DAG) 
this first statement restrict to tree structures the number 
of candidates for perfectly hierarchical structures. Note 
that these trees -including linear chains- have a single 
root node, i.e., m = 1. The pyramidal condition rules 
out from the set of perfectly hierarchical structures those 
DAGs having linear chains in their structure. In other 
words, nodes displaying ki n — k out = 1 are not allowed. 



We observe that, according to the pyramidal condition, 




Therefore, it is straightforward to conclude that the most 
simple representation of an ideal hierarchical structure is 
a binary tree -see fig ([2^t)-, in which the above inequal- 
ity becomes equality for all successive layers. This is 
consistent with standard graph theory and the definition 
of perfect binary tree. Finally, the symmetry condition, 
optional for our definition of hierarchy, rules out those 
trees which are not perfectly symmetrical. Whereas trees 
shown in figs. ^) and (J^Jd) can be considered hierarchi- 
cal by virtue of conditions 1) and 2), condition 3) makes 
a distinction between them, being only perfectly hierar- 
chical the first one. 

As a final remark, let us note that one can build non- 
hierarchical and anti-hierarchical structures, by simply 
violating some of the conditions we stated above, namely 
the pyramidal condition alone -fig. or both the def- 

initeness and the pyramidal condition, -see fig. ([2]i and 
|2]F). It is easy to see that a quantitative estimator of 
hierarchy should account for limit cases and place as in- 
termediate point structures such as the one depicted in 



B. Topological Richness and Reversibility 

The above features describe the perfect hierarchical 
structure. In this section we go further and we provide 
the basis for a definition of a hierarchical index of a causal 
graph grounded in the framework of information theory. 
This index provides a quantitative estimation of how far 
is a given causal graph from the conditions of a perfect 
hierarchy. 

In the following subsections we will define two en- 
tropies for a causal graph, attending to the top-down and 
bottom-up observations of the causal graph according to 
the onward and backward flows in the graph. The aim 
of this mathematical formalism is to quantify the impact 
of the number of pathways in the causal graph. Specifi- 
cally, we will consider the balance between the richness 
of causal paths (a top-down view) versus the uncertainty 
when going back reversing the causal flow (i.e., a bottom- 
up perspective). Then, attending to the direction of the 
flow we interpret the top-down view as a richness whilst 
the bottom-up as an uncertainty in terms of topologi- 
cal reversibility as recently introduced in [5]. Arguably, 
the larger is the number of decisions going down, the 
higher is the richness of causal paths. Similarly, the 
larger the number of alternative pathways to climb up, 
the larger will be the uncertainty in recovering the causal 
flow. In the following subsections, we will explore, within 
the framework of information theory, the relationship be- 
tween diversity and uncertainty and the their impact in 
the fulfillment of the hierarchy conditions. We begin this 
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section with a brief revision of the core concepts of infor- 
mation theory, to be used as our theoretical framework. 

According to classical information theory [TJ [7J [14j [26] , 
let us consider a system S with n possible states, whose 
occurrences are governed by a random variable X with an 
associated probability mass function formed by pi, ...,p n . 
According to the standard formalization, the uncertainty 
or entropy associated to X, to be written as H(X), is: 

H(X) = -J2Pi^gPu (7) 

i<n 

which is actually an average of log(l/p(X)) among all 
events of 5, namely, H(X) = (log(l/p(X))), where (...) 
is the expectation or average of the random quantity be- 
tween parentheses. Analogously, we can define the con- 
ditional entropy. Given another system S f containing 
n' values or choices, whose behavior is governed by a 
random variable Y, let ¥(s[\sj) be the conditional prob- 
ability of obtaining Y = G S' if we already know 
X = sj G S. Then, the conditional entropy of Y from 
X, to be written as H(Y\X), is defined as: 

H{Y\X) = E P&MlogP&h-). (8) 

j<n i<n' 

1. Topological reversibility: Definiteness condition 

The first task is to study the degree of reversibility of 
causal paths, thereby considering the role of the definite- 

I 



H(G\p) 



where we assume, unless indicated: 

(Wvi G p) q(vi) = 

Instead of a vector, now we construct a (n — m) x (n — m) 
matrix, $>(G) accounting for the combinatorics of paths 



ness condition. This will be evaluated by computing the 
uncertainty in reversing the process starting from a given 
node in /a. The formalism used in this section is close to 
the one developed in [5]. 

We first proceed to define the probability distribution 
from which the entropy will be evaluated. Accordingly, 
the probability to chose a path ir^ G TIm^ from node 
Vi G /i by making a random decision at every crossing 
when reverting the causal flow is: 



n k in (vj)' 

Vi£v(ir k ) 



The conditional entropy obtained when reverting the flow 
from Vi G /i will be: 



H(G\vi) = - P(^k)logP(7Tfch) (10) 



The overall uncertainty of G, written as H(G\p), is com- 
puted by averaging H over all minimal nodes, i.e: 



Vi) 

(11) 



and how they contribute to the computation of entropy. 

ir k :vj£v(ir k ) 

This represents the probability to reach vj starting from 
Vi. Now we derive the general expression for <3>. To com- 



= 52 q(vi)H(g\vi) 



52^ v i) E H^k\vi)log¥(Tr k \vi 



E« E 



^2 ^(7r k \vi)log(ki n (vj)) 

Vj£v(ir k ) 



52^) J2 mm^)) E p (^i 

Vi£n VjeV(U(vi) \jc k :vjev(7r k ) 

^2q( V i) ^2 fokiG) log k in (v k ). 

Vi£fJ> v k eV\M 
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pute the probability to reach a given node, we have to 
take into account the probability to follow a given path 
containing such a node, defined in (|9|. To rigorously 
connect it to the adjacency matrix, we first define an 
auxiliary, (n — m) x (n — m) matrix B(^), namely: 



Then, 



B(Q) 



ki n (vi) 



(12) 



where v^Vj G V \ M. From this definition, we obtain 
the explicit dependency of <E> from the adjacency matrix, 
namely, 



m= E (m k w),- 



k<L(Q) 

and accordingly, we have 

^i(0)=([B T ]°(0)) 



1. 



(13) 



(14) 



Therefore, we already obtained the explicit form of such 
a conditional entropy, namely: 

H(Q\ti)=^2q(vi) ^{Q)- log k in (v k ). (15) 

Vi£v v k eV\M 

Assuming equiprobability, the above expression leads to: 
H(G\v) = t^J2 Yl <l>ik(G)'logk in (v k ). (16) 

2. Topological richness: Pyramidal condition 

Let us now estimate the topological richness of a causal 
graph, i.e., the average amount of information needed to 
describe a given top-down path within the structure. Let 
us observe that the kind of question we are trying to an- 
swer is the same than the one explored above, but consid- 
ering the top-down approach. Therefore, the mathemat- 
ical form of this quantity, to be referred as H(G\M), will 
be formally analogous to the previous one, but consid- 
ering that we are going onwards according to the causal 
flow. Thus: 

H(G\M) = ^J2 E lM0)- l°g *ouiK). (17) 

VieM Vk £V\n 

where m is the cardinality of M -the set of maximal 
nodes, ipi k is analogous to qbi k of equation (16). In this 



case, elements ik of matrix \£ represent the probability 
to cross node v k departing from v\ G M according to the 
causal flow. The explicit expression of \£ is defined from 
matrix B'((5): 



B'{9)ii = 



MS)= E (pi*^)). 

k<L(G) 

and as above, we have 

V / % 



l. 



(18) 



(19) 



C. Hierarchy 

As we shall see, the above definition of information will 
bring us the ingredients to define a hierarchy index ac- 
cording to the list detailed in section [ill} Roughly speak- 
ing, what we propose in the following lines is to evalu- 
ate the balance between the pyramidal structure of the 
graph against the degree of reversibility of the paths it 
generates, i.e, the balance between H(Q\M) and H(Q\ji). 
However, in order to rigorously characterize hierarchy, we 
need to properly treat the studied graph attending to the 
different layers of its feedforward structure. The analysis 
of the graph structure allows us to identify and quantify 
deviations from the perfect structure at any level of the 
graph. The starting point will involve the characteriza- 
tion of a layered structure within the graph defining a 
partition W of the set of nodes. 



1. Dissecting the 



structure 



Given the DAG Q(V,E), let us define two partitions 
of V, W = {lji, Mm} and W = oj m }. The 

members of such partitions are the layers of the DAG by 
either performing a top down or bottom up leaf removal 
algorithm [6j [24] . Specifically, the first members of such 
partitions are defined as: 



and 



ui = {vi G V : k out (vi) = 0} 



Ul = {Vi G V : k in {Vi) = 0}. 



We observe that uj\ = /i and that uj\ = M. With the 
above subsets of V we can define the graphs (7i(Vi,£i), 
and Gi(y\,E\) in the following way: 



V 1 = V\uj 1 ; Ex =E\{(vi,v k ) : v k G^i}. 



and 



V 1 = V\uj 1 ] Ex= E\{(vi,v k ) :v k eu>i}. 

respectively. Similarly, we build 0J2, w\w\ as: 

^2 = {vi G Vi : k out (vi) = 0} 
v\w\-2 = i v i ^ V\w\-i : k out (vi) = 0} 
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FIG. 3: How to obtain the different subgraphs Q,Q\,Qi,Q\,Qi 
involved in the evaluation of hierarchy. Q\ and Qi are ob- 
tained through successive application of bottom-up leaf re- 
moval algorithm, which implies that we remove all nodes hav- 
ing kout = 0. Qi and Q2 are obtained by the successive appli- 
cation of a top-down leaf removal algorithm, thereby removing 
the nodes having fc n = 0. We observe that the generation of 
Qi and Qi implies the breaking of the net -see text. 



and 

^2 = {vi e Vi : k in (vi) = 0} 

&\W\ = {Vi e V\W\-2 : kin(Vi) = 0}, 

respectively. We therefore defined two sequences of sub- 
graphs of Q ordered by inclusion, namely: 



Q\w\-i Q ••• Q Gi ^ Q. 



and 



Q\w\-i Q ••• ^ Gi Q G- 

where it is easy to observe that: 

G\w\-i = M and G\w\-i f 1 - 

In fig. ([3| we describe the generation of these subsets of 
graphs for a given toy model of DAG. In summary, we 
constructed two collections of subsets by finding the lay- 
ered structure using a bottom-up leaf removal algorithm 



(pruning the elements having k out = successively) and 
a top-down leaf removal algorithm (i.e., pruning the ele- 
ments having ki n = 0). Notice that, even if 



one cannot assume 



\W\ = \W\, 



Wi = W\ W \_i, 



except in symmetrical cases. 



2. The hierarchy index 

In order to generate a normalized estimator f(Q) (be- 
tween — 1 and 1) accounting for the balance between 
H{Q\M) and H(Q\ji) we will define it as: 



f(G) 



H(g\M)-H(g\») 

m^{H(g\M),H(g\fi)}' 



Since both the layered structure and its pyramidal com- 
position must be taken into account, the hierarchical in- 
dex of the graph must be weighted taking into account 
the successive layers of the system. This avoids to iden- 
tify as completely hierarchical those structures not per- 
fectly satisfying the pyramidal condition. Therefore, the 
hierarchical index of a feed-forward net, to be indicated 
as v(Q) will be the average among the \W\ — 2 subgraphs 
g u ...,Qk, the \W\ -2 subgraphs Gk, ... and Q 

itself -note that we average between 2\W\ — 3 objects, 



2\W\ -3 



f(G) 



i<\W\- 



m) + m) .(20) 



It is strictly necessary to take into account all these 
subgraphs in order to identify any violation of the hier- 
archy conditions at any level of the structure. We can go 
a step further by imposing symmetry in the pyramidal 
structure as suggested above, to distinguish among dif- 
ferent topologies such as those displayed in figure 
Let us indicate by UMn(Gk) the set of paths from Mto/i 
present in the graph Q^. The so-called Jensen's inequality 
[7] provides an upper bound of the information content 
which, in our case reads: 

a) H(Q k \M) <log|n M/ z(£fc)|, 

b) H(g k \fJL)<l0g\UM^Gk)l 

We observe that a) is only achieved when all |IIm/x| from 
M to fx are equiprobable, being this equiprobability an 
indicator of symmetry. The same applies to b), but now 
we consider the bottom-up estimator, when paths are 
considered from /i to M. In this case, attending to the 
symmetric condition we can define and estimator analo- 
gous to /, namely, 



9(G) 



H(g k \M)-H(g k \fi) 
iog|n M „(e)| 
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FIG. 4: Different values of the hierarchy index corresponding to some toy DAGs. v(Q) refers to the hierarchical index where 
the symmetry is not considered and is s (G) refers to the hierarchical index where symmetry is taken into account -see text. 



Accordingly, the symmetrized version of the hierarchy 
index, v s (Q), would be: 

^) = 2lW1-3 \ 9iQH S 9{Qi)+9{Gi)\. {21) 

To ensure the consistency of v and v s we must over- 
come the last conceptual problems. When perform- 
ing the leaf removal operation one can break the graph 
in many connected components having no connections 
among them. Let us indicate as Ci(i), ...,Cfc(i) the set of 
components of our graph each of them obtained from 
the set Vk(i) Q V% of nodes and their. In this case, the 
natural way to proceed is to average the individual con- 
tributions of the different connected components of Qi or 
Qi according to the number of nodes they have against 
\Vi\ or Vi, leading to: 

f^i) = E 1^(01/(^(0)- 

1 *' C fc (i) 

The same applies for g(Gi) for the computation of v s . Fi- 
nally, we impose, for both mathematical and conceptual 
consistency that: 

(max{#(£|M),#(£|/i)} = 0) => {y(Q) = 0). 

Furthermore, if E = 0, (i.e., the case where the graph 
consists of a single node): 

v{Q) = 0. 

According to this formulation some scenarios would lead 
to v(Q) = 0. The simplest one is the just mentioned by 
definition, consisting of a single node. Another one is the 
linear feed- forward chain having 2 or more linked nodes. 
It is clear that in these cases, H(Q\M) = H{Q\ji) = 0. It 
is worth to stress that this particular situation matches 
with the causal graph of a total order relation, and there- 
fore, we have the way to differentiate this particular 
graph from other structures having null hierarchy. Fi- 
nally, a third class of structures belongs to the family 



of non-hierarchical graphs. They give v(Q) = since 
H(Q\M) = H(g\fi). This is the case of Erdos Renyi 
DAGs or DAG cliques. In these cases, the causal graph 
is not hierarchical because all the diversity of paths gen- 
erated when crossing the causal flow downwards is neu- 
tralized by the uncertainty in recovering any causal path 
backwards. 



D. Numerical Exploration 

In this section we evaluated the hierarchy of several toy 
models in order to intuitively grasp the scope of the mea- 
sure. In fig. Q we evaluated the hierarchy index (both 
the raw one and the symmetrical one) for several struc- 
tures leading to hierarchical, anti-hierarchical and non- 
hierarchical structures. The figure illustrates the impact 
of number of maximals and minimals and the multiplicity 
of pathways in relation to the existence of a pyramidal 
and predictable structure. We observe that deviations 
from tree and inverted tree configurations lead to a non 
binary interpretation of hierarchy. 

Furthermore, we measured (fig. ([5| the impact in 
terms of hierarchy of arc addition preserving the acyclic 
character. Staring from two extreme tree graphs (the 
feedforward and the inverted ones, respectively) we add 
arcs at random until we reach a fully connected feed- 
forward structure in both situations. We consider the 
starting point of our numerical experiment a binary tree, 
T(V,E) containing n = 15 nodes. We construct an in- 
verted binary tree T'(V,E) by the transposition of the 
adjacency matrix of T(V, E). In both graphs we say that, 
consistently with the ordering property of DAGs, given 
an arc ((vi,Vj) G E) then (i < j). In an iterative pro- 
cess we construct two new DAGs G l (V, E l ) where i labels 
the number of additions of new arcs to the underlying 
T(y,E) and T'(V,E). The process ends when graphs 
achieve the directed acyclic clique condition, i.e., the lin- 
early ordered graph <3* = (V,E*) containing 15 nodes: 

(Vvi, Vj eV) : (i > j)((vj,Vi) £ E*) 
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number of arcs added 



FIG. 5: The evolution of the hierarchy index. We start with 
a binary tree and a binary inverted tree with n = 15 and 
\E\ = 14. For both graphs we add, in an iterative way, arcs at 
random until we reach the feed forward clique configuration. 
Note that the link addition to the two initial tree-like struc- 
tures converge in the same full connected configuration which 
contains links. For both experiments squares represent 
v values, and triangles is s . As expected, \v\ > \v s \, except in 
limit cases. The small graphs provide a visual clue for type 
of structures we are obtaining through the arc enrichment 
process. For every point in the chart, mean and standard 
deviation were calculated from 250 replicas. Entropies were 
computed considering log 2 . 



For statistical significance, we performed 250 replicas of 
the numerical experiment. Then (y) and (i/ s ) and their 
respective standard deviations were calculated for each 
set of iterations. Fig. ([5| shows that starting from 
an initial value of v = 1 for T(V,E) and v = —1 for 
T'(V,E), the addition of feed-forward arcs causes a de- 
crease in absolute value of the hierarchical indexes until 
v = v s = corresponding to a total linear ordered struc- 
ture where every possible feed-forward path is included in 
the graph. As expected, \v\ > \u s \ except in the extreme 
full-connected cases. 

We finally test the case of a directed acyclic Erdos 
Renyi (ER) graph TZ(V,E). This is an interesting ex- 
ample of a topologically homogeneous DAG with non- 
correlation in terms of ki n and k out [10]. Graphs were 
obtained by the construction of an undirected ER graph 
Ger(V, E u ) where E u is the set of edges (undirected 
links) . Directed acyclic condition was obtained by a pro- 
cess of random numbering of nodes [10]. The direction 
of the arrows was defined attending 

(vi,Vj) e E : ({vi,Vj} e E u M < j) 

(i.e., condition depicted in eq. (6)) Fig. [6] shows a rep- 
resentative behavior of the null hierarchical character of 




-0.1 -0.05 0.05 0.1 -0.1 -0.05 0.05 0.1 



FIG. 6: Distribution of v and is s for an ensemble of 1, 000 repli- 
cas of directed acyclic ER graphs of |V| = 500 and (k) = 4 
caption. Numerical results show Gaussian-like distributions 
centered at zero. Notice that is s distribution displays a nar- 
rower variation than the v one in agreement to the \v\ > \v s \ 
inequality. 



TZ(V,E) ensembles. Notice that the normal distribution 
is centered for v and v s at zero values, indicating that 
such random structures have not any hierarchical orga- 
nization. 



IV. DISCUSSION 

Hierarchical patterns are known to pervade multiple 
aspects of complexity. In spite of their relevance, it is 
not obvious in general how to formalize them in terms of 
a quantitative theory. This paper presents a definition 
of hierarchy to be applied to the so-called causal graphs, 
i.e., connected, directed acyclic graphs where arcs depict 
some direct causal relation between the elements defin- 
ing the nodes. It is therefore a measure of hierarchy over 
the structure of the causal flow. The conceptual basis 
of this measure is rooted in two fundamental features 
defining hierarchy: the absence of ambiguity in recov- 
ering the causal flow and the presence of a pyramidal 
structure. The hierarchy index presented here weights 
the deviations from such general properties. The specific 
expression for this index is derived using techniques and 
concepts from information theory. It is shown, thus, that 
the requirements of hierarchy naturally fit the tension be- 
tween richness in causal paths against the uncertainty in 
recovering them depicted by a balance between two con- 
ditional entropies. 

Under our previous assumptions, we have shown that 
the feed-forward tree is the structure that fully satisfies 
the conditions for a perfect hierarchical system. Inter- 
estingly, trees as perfect representations of hierarchies is 
a long-standing idea [32]. In this way, our mathematical 
formalization establishes a bridge between the qualitative 



11 



idea of hierarchy and its quantification. Our approach 
allows to measure the hierarchy of any system provided 
that it can be represented in a feedforward causal graph. 

Throughout the paper we emphasized that although 
hierarchy is deeply tied to order, there are strong reasons 
to go beyond it. The most obvious one is that order 
is a well established concept and therefore, there is no 
need for the use of a different word, if we identify it with 
hierarchy. But another important issue must be taken 
into account tied to the intuitive notion of hierarchy we 
propose. We propose that hierarchy must feature the 
pyramidal nature of the connective patterns and that this 
pyramidal structure must be in agreement with the top- 
down nature of the feed-forward flow of causality. 

Information theory reveals extremely suitable to de- 
fine a hallmark to study hierarchy in the terms described 
in this paper: the richer the structure (but at the same 
time, reversible, in topological terms), the more hierar- 
chical it is. In this way, since the conditions we defined 
for a system to be perfectly hierarchical lead us to con- 
clude that a feed forward tree is the perfect hierarchi- 
cal structure since maximizes the richness without loss 
of predictability. It is worth to note that precisely, the 
pyramidal condition is the key point to guarantee the 
the predictability. The extreme case is the feedforward 
clique - see fig. |4|. Although richness can be increased 
through pathway redundancy, this effect cancels out due 
to decreased predictability, leading to a non hierarchical 
structure. A particular case is the linear chain. This 
representation of a total order relation has null values of 
both entropies. In other words, a perfect predictable sys- 
tem but without richness. It is worth to note that both 



cases are not pyramidal structures. By contrast, anti- 
hierarchical ones exhibit an inverted pyramidal structure 
leading to a different effect. From the perspective of our 
formalism, the anti-hierarchical organization occurs by 
minimizing richness and predictability. In consequence, 
other structure different than an inverted tree will be less 
anti-hierarchical. Therefore it is easy to see that the hier- 
archical index in absolute value measures the closeness of 
an (anti)-hierarchical tree structure capturing somehow 
the path complexity of the structure. 

Further work should explore the relation of this the- 
oretical achievement within the framework of a formal 
measure of complexity. Additionally, this research could 
be expanded to a more general class of directed graphs 
containing cycles. This latter point would be achieved by 
properly defining a measure of how well ordered is a net, 
for it is clear that the presence of cycles will generate 
conceptual problems in the identification of the causal 
flow. 
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